Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells431423
Missing cells (%)8.1%7.9%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical77

Alerts

Dataset ADataset B
Name has a high cardinality: 446 distinct values Name has a high cardinality: 446 distinct values High Cardinality
Ticket has a high cardinality: 375 distinct values Ticket has a high cardinality: 375 distinct values High Cardinality
Cabin has a high cardinality: 82 distinct values Cabin has a high cardinality: 93 distinct values High Cardinality
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 84 (18.8%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 347 (77.8%) missing values Cabin has 334 (74.9%) missing values Missing
Name is uniformly distributed Name is uniformly distributed Uniform
Ticket is uniformly distributed Ticket is uniformly distributed Uniform
Cabin is uniformly distributed Cabin is uniformly distributed Uniform
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 307 (68.8%) zeros SibSp has 289 (64.8%) zeros Zeros
Parch has 343 (76.9%) zeros Parch has 326 (73.1%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 10 (2.2%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-03-08 14:20:12.0213642023-03-08 14:20:18.887236
Analysis finished2023-03-08 14:20:18.8835052023-03-08 14:20:23.675310
Duration6.86 seconds4.79 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean456.97982429.06278
 Dataset ADataset B
Minimum12
Maximum889890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-03-08T14:20:23.859548image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum12
5-th percentile47.2539.75
Q1251.5213.25
median457423.5
Q3668.25637.75
95-th percentile850.25844.5
Maximum889890
Range888888
Interquartile range (IQR)416.75424.5

Descriptive statistics

 Dataset ADataset B
Standard deviation251.33269254.03925
Coefficient of variation (CV)0.549986420.59207945
Kurtosis-1.1117045-1.128275
Mean456.97982429.06278
Median Absolute Deviation (MAD)208.5212.5
Skewness-0.0345982770.08687862
Sum203813191362
Variance63168.12364535.942
MonotonicityNot monotonicNot monotonic
2023-03-08T14:20:24.144400image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
165 1
 
0.2%
573 1
 
0.2%
205 1
 
0.2%
229 1
 
0.2%
377 1
 
0.2%
183 1
 
0.2%
273 1
 
0.2%
325 1
 
0.2%
438 1
 
0.2%
602 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
66 1
 
0.2%
228 1
 
0.2%
884 1
 
0.2%
42 1
 
0.2%
695 1
 
0.2%
126 1
 
0.2%
242 1
 
0.2%
486 1
 
0.2%
106 1
 
0.2%
348 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
14 1
0.2%
15 1
0.2%
17 1
0.2%
19 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
18 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
14 1
0.2%
15 1
0.2%
17 1
0.2%
19 1
0.2%
20 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
272 
1
174 
0
260 
1
186 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row10
3rd row00
4th row00
5th row10

Common Values

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Length

2023-03-08T14:20:24.372084image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-03-08T14:20:24.606467image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:24.755147image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Most occurring characters

ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%
ValueCountFrequency (%)
0 260
58.3%
1 186
41.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
247 
1
114 
2
85 
3
229 
1
125 
2
92 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row31
3rd row11
4th row13
5th row31

Common Values

ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Length

2023-03-08T14:20:24.881624image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-03-08T14:20:25.039911image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:25.251503image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Most occurring characters

ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 247
55.4%
1 114
25.6%
2 85
 
19.1%
ValueCountFrequency (%)
3 229
51.3%
1 125
28.0%
2 92
20.6%

Name
Categorical

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
Panula, Master. Eino Viljami
 
1
Flynn, Mr. John Irwin ("Irving")
 
1
Cohen, Mr. Gurshon "Gus"
 
1
Fahlstrom, Mr. Arne Jonas
 
1
Landergren, Miss. Aurora Adelia
 
1
Other values (441)
441 
Moubarek, Master. Gerios
 
1
Lovell, Mr. John Hall ("Henry")
 
1
Banfield, Mr. Frederick James
 
1
Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)
 
1
Weir, Col. John
 
1
Other values (441)
441 

Length

 Dataset ADataset B
Max length8282
Median length4849.5
Mean length26.84080727.159193
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1197112113
Distinct characters5959
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowPanula, Master. Eino ViljamiMoubarek, Master. Gerios
2nd rowJalsevac, Mr. IvanFutrelle, Mr. Jacques Heath
3rd rowReuchlin, Jonkheer. John GeorgeSilvey, Mr. William Baird
4th rowGuggenheim, Mr. BenjaminPavlovic, Mr. Stefo
5th rowSunderland, Mr. Victor FrancisThayer, Mr. John Borland

Common Values

ValueCountFrequency (%)
Panula, Master. Eino Viljami 1
 
0.2%
Flynn, Mr. John Irwin ("Irving") 1
 
0.2%
Cohen, Mr. Gurshon "Gus" 1
 
0.2%
Fahlstrom, Mr. Arne Jonas 1
 
0.2%
Landergren, Miss. Aurora Adelia 1
 
0.2%
Asplund, Master. Clarence Gustaf Hugo 1
 
0.2%
Mellinger, Mrs. (Elizabeth Anne Maidment) 1
 
0.2%
Sage, Mr. George John Jr 1
 
0.2%
Richards, Mrs. Sidney (Emily Hocking) 1
 
0.2%
Slabenoff, Mr. Petco 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
Moubarek, Master. Gerios 1
 
0.2%
Lovell, Mr. John Hall ("Henry") 1
 
0.2%
Banfield, Mr. Frederick James 1
 
0.2%
Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) 1
 
0.2%
Weir, Col. John 1
 
0.2%
Nicola-Yarred, Master. Elias 1
 
0.2%
Murphy, Miss. Katherine "Kate" 1
 
0.2%
Lefebre, Miss. Jeannie 1
 
0.2%
Mionoff, Mr. Stoytcho 1
 
0.2%
Davison, Mrs. Thomas Henry (Mary E Finck) 1
 
0.2%
Other values (436) 436
97.8%

Length

2023-03-08T14:20:25.583488image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
mr 257
 
14.2%
miss 99
 
5.5%
mrs 62
 
3.4%
william 32
 
1.8%
john 19
 
1.1%
master 17
 
0.9%
henry 16
 
0.9%
frederick 13
 
0.7%
charles 13
 
0.7%
james 13
 
0.7%
Other values (875) 1263
70.0%
ValueCountFrequency (%)
mr 260
 
14.3%
miss 94
 
5.2%
mrs 63
 
3.5%
william 31
 
1.7%
john 22
 
1.2%
master 21
 
1.2%
henry 16
 
0.9%
thomas 13
 
0.7%
george 12
 
0.7%
charles 11
 
0.6%
Other values (887) 1280
70.2%

Most occurring characters

ValueCountFrequency (%)
1358
 
11.3%
r 949
 
7.9%
e 843
 
7.0%
a 836
 
7.0%
n 676
 
5.6%
s 663
 
5.5%
i 656
 
5.5%
M 568
 
4.7%
o 511
 
4.3%
l 510
 
4.3%
Other values (49) 4401
36.8%
ValueCountFrequency (%)
1377
 
11.4%
r 994
 
8.2%
e 850
 
7.0%
a 834
 
6.9%
i 665
 
5.5%
s 652
 
5.4%
n 639
 
5.3%
M 569
 
4.7%
l 541
 
4.5%
o 510
 
4.2%
Other values (49) 4482
37.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7711
64.4%
Uppercase Letter 1811
 
15.1%
Space Separator 1358
 
11.3%
Other Punctuation 952
 
8.0%
Close Punctuation 67
 
0.6%
Open Punctuation 67
 
0.6%
Dash Punctuation 5
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7799
64.4%
Uppercase Letter 1831
 
15.1%
Space Separator 1377
 
11.4%
Other Punctuation 957
 
7.9%
Close Punctuation 72
 
0.6%
Open Punctuation 72
 
0.6%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1358
100.0%
ValueCountFrequency (%)
1377
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 949
12.3%
e 843
10.9%
a 836
10.8%
n 676
8.8%
s 663
8.6%
i 656
8.5%
o 511
 
6.6%
l 510
 
6.6%
t 333
 
4.3%
h 261
 
3.4%
Other values (16) 1473
19.1%
ValueCountFrequency (%)
r 994
12.7%
e 850
10.9%
a 834
10.7%
i 665
8.5%
s 652
8.4%
n 639
8.2%
l 541
 
6.9%
o 510
 
6.5%
t 338
 
4.3%
h 276
 
3.5%
Other values (16) 1500
19.2%
Uppercase Letter
ValueCountFrequency (%)
M 568
31.4%
A 115
 
6.4%
J 109
 
6.0%
C 100
 
5.5%
S 95
 
5.2%
H 94
 
5.2%
E 80
 
4.4%
L 70
 
3.9%
W 70
 
3.9%
R 58
 
3.2%
Other values (15) 452
25.0%
ValueCountFrequency (%)
M 569
31.1%
A 132
 
7.2%
H 107
 
5.8%
J 102
 
5.6%
C 88
 
4.8%
E 84
 
4.6%
S 80
 
4.4%
B 78
 
4.3%
W 74
 
4.0%
G 62
 
3.4%
Other values (15) 455
24.8%
Other Punctuation
ValueCountFrequency (%)
. 447
47.0%
, 446
46.8%
" 54
 
5.7%
' 5
 
0.5%
ValueCountFrequency (%)
, 446
46.6%
. 446
46.6%
" 62
 
6.5%
' 3
 
0.3%
Close Punctuation
ValueCountFrequency (%)
) 67
100.0%
ValueCountFrequency (%)
) 72
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
ValueCountFrequency (%)
( 72
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9522
79.5%
Common 2449
 
20.5%
ValueCountFrequency (%)
Latin 9630
79.5%
Common 2483
 
20.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1358
55.5%
. 447
 
18.3%
, 446
 
18.2%
) 67
 
2.7%
( 67
 
2.7%
" 54
 
2.2%
- 5
 
0.2%
' 5
 
0.2%
ValueCountFrequency (%)
1377
55.5%
, 446
 
18.0%
. 446
 
18.0%
) 72
 
2.9%
( 72
 
2.9%
" 62
 
2.5%
- 5
 
0.2%
' 3
 
0.1%
Latin
ValueCountFrequency (%)
r 949
 
10.0%
e 843
 
8.9%
a 836
 
8.8%
n 676
 
7.1%
s 663
 
7.0%
i 656
 
6.9%
M 568
 
6.0%
o 511
 
5.4%
l 510
 
5.4%
t 333
 
3.5%
Other values (41) 2977
31.3%
ValueCountFrequency (%)
r 994
 
10.3%
e 850
 
8.8%
a 834
 
8.7%
i 665
 
6.9%
s 652
 
6.8%
n 639
 
6.6%
M 569
 
5.9%
l 541
 
5.6%
o 510
 
5.3%
t 338
 
3.5%
Other values (41) 3038
31.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11971
100.0%
ValueCountFrequency (%)
ASCII 12113
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1358
 
11.3%
r 949
 
7.9%
e 843
 
7.0%
a 836
 
7.0%
n 676
 
5.6%
s 663
 
5.5%
i 656
 
5.5%
M 568
 
4.7%
o 511
 
4.3%
l 510
 
4.3%
Other values (49) 4401
36.8%
ValueCountFrequency (%)
1377
 
11.4%
r 994
 
8.2%
e 850
 
7.0%
a 834
 
6.9%
i 665
 
5.5%
s 652
 
5.4%
n 639
 
5.3%
M 569
 
4.7%
l 541
 
4.5%
o 510
 
4.2%
Other values (49) 4482
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
282 
female
164 
male
289 
female
157 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.7354264.7040359
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21122098
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowmalemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 282
63.2%
female 164
36.8%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Length

2023-03-08T14:20:25.805350image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-03-08T14:20:25.998867image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:26.160634image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
male 282
63.2%
female 164
36.8%
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%

Most occurring characters

ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2112
100.0%
ValueCountFrequency (%)
Lowercase Letter 2098
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 2112
100.0%
ValueCountFrequency (%)
Latin 2098
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2112
100.0%
ValueCountFrequency (%)
ASCII 2098
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7475
Distinct (%)20.4%21.0%
Missing8489
Missing (%)18.8%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.56284529.163165
 Dataset ADataset B
Minimum0.420.67
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-03-08T14:20:26.392183image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.67
5-th percentile3.054
Q12120
median2828
Q337.7536
95-th percentile54.9554.4
Maximum8080
Range79.5879.33
Interquartile range (IQR)16.7516

Descriptive statistics

 Dataset ADataset B
Standard deviation14.49862114.637636
Coefficient of variation (CV)0.490433870.50192205
Kurtosis0.235763110.24530319
Mean29.56284529.163165
Median Absolute Deviation (MAD)88
Skewness0.329003590.40175093
Sum10701.7510411.25
Variance210.21214.26038
MonotonicityNot monotonicNot monotonic
2023-03-08T14:20:26.661695image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 17
 
3.8%
24 15
 
3.4%
22 14
 
3.1%
18 14
 
3.1%
27 14
 
3.1%
36 14
 
3.1%
21 13
 
2.9%
19 11
 
2.5%
35 10
 
2.2%
29 10
 
2.2%
Other values (64) 230
51.6%
(Missing) 84
 
18.8%
ValueCountFrequency (%)
22 17
 
3.8%
24 16
 
3.6%
35 12
 
2.7%
18 12
 
2.7%
21 12
 
2.7%
36 12
 
2.7%
34 11
 
2.5%
29 11
 
2.5%
28 11
 
2.5%
30 11
 
2.5%
Other values (65) 232
52.0%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 5
1.1%
2 7
1.6%
3 3
0.7%
4 3
0.7%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 3
0.7%
4 6
1.3%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 5
1.1%
2 7
1.6%
3 3
0.7%
4 3
0.7%
5 4
0.9%
6 1
 
0.2%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.540358740.58071749
 Dataset ADataset B
Minimum00
Maximum88
Zeros307289
Zeros (%)68.8%64.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-03-08T14:20:26.845440image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.15415571.1343405
Coefficient of variation (CV)2.13590641.9533432
Kurtosis16.65332914.240695
Mean0.540358740.58071749
Median Absolute Deviation (MAD)00
Skewness3.60557033.3066759
Sum241259
Variance1.33207541.2867285
MonotonicityNot monotonicNot monotonic
2023-03-08T14:20:27.099670image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 307
68.8%
1 98
 
22.0%
2 15
 
3.4%
3 10
 
2.2%
4 9
 
2.0%
8 4
 
0.9%
5 3
 
0.7%
ValueCountFrequency (%)
0 289
64.8%
1 117
26.2%
2 12
 
2.7%
4 11
 
2.5%
3 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 98
 
22.0%
2 15
 
3.4%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 4
 
0.9%
ValueCountFrequency (%)
0 289
64.8%
1 117
26.2%
2 12
 
2.7%
3 10
 
2.2%
4 11
 
2.5%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 289
64.8%
1 117
26.2%
2 12
 
2.7%
3 10
 
2.2%
4 11
 
2.5%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 98
 
22.0%
2 15
 
3.4%
3 10
 
2.2%
4 9
 
2.0%
5 3
 
0.7%
8 4
 
0.9%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.345291480.44170404
 Dataset ADataset B
Minimum00
Maximum66
Zeros343326
Zeros (%)76.9%73.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-03-08T14:20:27.232055image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.741733640.85326117
Coefficient of variation (CV)2.14813771.9317486
Kurtosis12.3942477.9834361
Mean0.345291480.44170404
Median Absolute Deviation (MAD)00
Skewness2.9319012.439101
Sum154197
Variance0.550168790.72805462
MonotonicityNot monotonicNot monotonic
2023-03-08T14:20:27.343225image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 343
76.9%
1 63
 
14.1%
2 35
 
7.8%
3 2
 
0.4%
6 1
 
0.2%
4 1
 
0.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 326
73.1%
1 59
 
13.2%
2 54
 
12.1%
4 2
 
0.4%
3 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 63
 
14.1%
2 35
 
7.8%
3 2
 
0.4%
4 1
 
0.2%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 326
73.1%
1 59
 
13.2%
2 54
 
12.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 326
73.1%
1 59
 
13.2%
2 54
 
12.1%
3 2
 
0.4%
4 2
 
0.4%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 63
 
14.1%
2 35
 
7.8%
3 2
 
0.4%
4 1
 
0.2%
5 1
 
0.2%
6 1
 
0.2%

Ticket
Categorical

 Dataset ADataset B
Distinct375375
Distinct (%)84.1%84.1%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
CA 2144
 
4
CA. 2343
 
4
1601
 
4
LINE
 
4
4133
 
3
Other values (370)
427 
CA 2144
 
5
17421
 
4
347082
 
4
347088
 
4
347077
 
4
Other values (370)
425 

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.677136.6950673
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29782986
Distinct characters3531
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique322323 ?
Unique (%)72.2%72.4%

Sample

 Dataset ADataset B
1st row31012952661
2nd row349240113803
3rd row1997213507
4th rowPC 17593349242
5th rowSOTON/OQ 39208917421

Common Values

ValueCountFrequency (%)
CA 2144 4
 
0.9%
CA. 2343 4
 
0.9%
1601 4
 
0.9%
LINE 4
 
0.9%
4133 3
 
0.7%
S.O.C. 14879 3
 
0.7%
113760 3
 
0.7%
PC 17760 3
 
0.7%
347082 3
 
0.7%
347088 3
 
0.7%
Other values (365) 412
92.4%
ValueCountFrequency (%)
CA 2144 5
 
1.1%
17421 4
 
0.9%
347082 4
 
0.9%
347088 4
 
0.9%
347077 4
 
0.9%
C.A. 34651 3
 
0.7%
CA. 2343 3
 
0.7%
35273 3
 
0.7%
347742 3
 
0.7%
PC 17755 3
 
0.7%
Other values (365) 410
91.9%

Length

2023-03-08T14:20:27.499222image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
pc 36
 
6.4%
c.a 12
 
2.1%
ca 9
 
1.6%
a/5 7
 
1.2%
ston/o 6
 
1.1%
2 6
 
1.1%
2144 4
 
0.7%
soton/oq 4
 
0.7%
line 4
 
0.7%
1601 4
 
0.7%
Other values (393) 473
83.7%
ValueCountFrequency (%)
pc 34
 
6.0%
c.a 14
 
2.5%
ca 9
 
1.6%
a/5 8
 
1.4%
w./c 6
 
1.1%
2 6
 
1.1%
ston/o 6
 
1.1%
sc/paris 5
 
0.9%
2144 5
 
0.9%
17421 4
 
0.7%
Other values (393) 468
82.8%

Most occurring characters

ValueCountFrequency (%)
3 373
12.5%
1 348
11.7%
2 289
9.7%
7 247
8.3%
4 235
 
7.9%
6 228
 
7.7%
0 213
 
7.2%
5 178
 
6.0%
9 150
 
5.0%
8 126
 
4.2%
Other values (25) 591
19.8%
ValueCountFrequency (%)
3 374
12.5%
1 337
11.3%
2 293
9.8%
7 246
8.2%
4 222
 
7.4%
6 217
 
7.3%
0 200
 
6.7%
5 197
 
6.6%
9 167
 
5.6%
8 139
 
4.7%
Other values (21) 594
19.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2387
80.2%
Uppercase Letter 323
 
10.8%
Other Punctuation 133
 
4.5%
Space Separator 119
 
4.0%
Lowercase Letter 16
 
0.5%
ValueCountFrequency (%)
Decimal Number 2392
80.1%
Uppercase Letter 323
 
10.8%
Other Punctuation 144
 
4.8%
Space Separator 119
 
4.0%
Lowercase Letter 8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 373
15.6%
1 348
14.6%
2 289
12.1%
7 247
10.3%
4 235
9.8%
6 228
9.6%
0 213
8.9%
5 178
7.5%
9 150
6.3%
8 126
 
5.3%
ValueCountFrequency (%)
3 374
15.6%
1 337
14.1%
2 293
12.2%
7 246
10.3%
4 222
9.3%
6 217
9.1%
0 200
8.4%
5 197
8.2%
9 167
7.0%
8 139
 
5.8%
Space Separator
ValueCountFrequency (%)
119
100.0%
ValueCountFrequency (%)
119
100.0%
Other Punctuation
ValueCountFrequency (%)
. 92
69.2%
/ 41
30.8%
ValueCountFrequency (%)
. 97
67.4%
/ 47
32.6%
Uppercase Letter
ValueCountFrequency (%)
C 81
25.1%
P 54
16.7%
O 46
14.2%
A 37
11.5%
S 31
 
9.6%
N 21
 
6.5%
T 17
 
5.3%
Q 7
 
2.2%
E 6
 
1.9%
I 6
 
1.9%
Other values (6) 17
 
5.3%
ValueCountFrequency (%)
C 80
24.8%
P 54
16.7%
O 48
14.9%
A 39
12.1%
S 33
10.2%
N 18
 
5.6%
T 17
 
5.3%
W 11
 
3.4%
Q 8
 
2.5%
I 4
 
1.2%
Other values (4) 11
 
3.4%
Lowercase Letter
ValueCountFrequency (%)
a 4
25.0%
s 4
25.0%
i 3
18.8%
r 3
18.8%
l 1
 
6.2%
e 1
 
6.2%
ValueCountFrequency (%)
r 2
25.0%
i 2
25.0%
s 2
25.0%
a 2
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2639
88.6%
Latin 339
 
11.4%
ValueCountFrequency (%)
Common 2655
88.9%
Latin 331
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 373
14.1%
1 348
13.2%
2 289
11.0%
7 247
9.4%
4 235
8.9%
6 228
8.6%
0 213
8.1%
5 178
6.7%
9 150
5.7%
8 126
 
4.8%
Other values (3) 252
9.5%
ValueCountFrequency (%)
3 374
14.1%
1 337
12.7%
2 293
11.0%
7 246
9.3%
4 222
8.4%
6 217
8.2%
0 200
7.5%
5 197
7.4%
9 167
6.3%
8 139
 
5.2%
Other values (3) 263
9.9%
Latin
ValueCountFrequency (%)
C 81
23.9%
P 54
15.9%
O 46
13.6%
A 37
10.9%
S 31
 
9.1%
N 21
 
6.2%
T 17
 
5.0%
Q 7
 
2.1%
E 6
 
1.8%
I 6
 
1.8%
Other values (12) 33
9.7%
ValueCountFrequency (%)
C 80
24.2%
P 54
16.3%
O 48
14.5%
A 39
11.8%
S 33
10.0%
N 18
 
5.4%
T 17
 
5.1%
W 11
 
3.3%
Q 8
 
2.4%
I 4
 
1.2%
Other values (8) 19
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2978
100.0%
ValueCountFrequency (%)
ASCII 2986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 373
12.5%
1 348
11.7%
2 289
9.7%
7 247
8.3%
4 235
 
7.9%
6 228
 
7.7%
0 213
 
7.2%
5 178
 
6.0%
9 150
 
5.0%
8 126
 
4.2%
Other values (25) 591
19.8%
ValueCountFrequency (%)
3 374
12.5%
1 337
11.3%
2 293
9.8%
7 246
8.2%
4 222
 
7.4%
6 217
 
7.3%
0 200
 
6.7%
5 197
 
6.6%
9 167
 
5.6%
8 139
 
4.7%
Other values (21) 594
19.9%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct172177
Distinct (%)38.6%39.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.77774635.344403
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros910
Zeros (%)2.0%2.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-03-08T14:20:27.703458image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.89588.05
median14.454215.975
Q331.27532.875
95-th percentile110.8833112.67708
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.379224.825

Descriptive statistics

 Dataset ADataset B
Standard deviation51.48035256.871171
Coefficient of variation (CV)1.57058861.6090573
Kurtosis36.4665934.669628
Mean32.77774635.344403
Median Absolute Deviation (MAD)6.95848.7458
Skewness5.04162655.0857181
Sum14618.87415763.604
Variance2650.22673234.3301
MonotonicityNot monotonicNot monotonic
2023-03-08T14:20:27.943542image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.75 22
 
4.9%
13 21
 
4.7%
8.05 19
 
4.3%
7.8958 17
 
3.8%
26 14
 
3.1%
10.5 12
 
2.7%
7.925 10
 
2.2%
26.55 10
 
2.2%
0 9
 
2.0%
7.8542 9
 
2.0%
Other values (162) 303
67.9%
ValueCountFrequency (%)
8.05 25
 
5.6%
13 17
 
3.8%
7.75 16
 
3.6%
7.8958 15
 
3.4%
26 15
 
3.4%
10.5 12
 
2.7%
26.55 11
 
2.5%
0 10
 
2.2%
7.2292 10
 
2.2%
7.925 7
 
1.6%
Other values (167) 308
69.1%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 10
2.2%
5 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 10
2.2%
5 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 2
 
0.4%
6.75 2
 
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.125 1
 
0.2%

Cabin
Categorical

 Dataset ADataset B
Distinct8293
Distinct (%)82.8%83.0%
Missing347334
Missing (%)77.8%74.9%
Memory size7.0 KiB7.0 KiB
B96 B98
 
3
B18
 
2
D
 
2
E101
 
2
C65
 
2
Other values (77)
88 
B96 B98
 
3
F33
 
3
C23 C25 C27
 
3
D36
 
2
C123
 
2
Other values (88)
99 

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.46464653.75
Min length12

Characters and Unicode

 Dataset ADataset B
Total characters343420
Distinct characters1818
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6677 ?
Unique (%)66.7%68.8%

Sample

 Dataset ADataset B
1st rowB82 B84C123
2nd rowC78E44
3rd rowB3C68
4th rowC124C126
5th rowD6B49

Common Values

ValueCountFrequency (%)
B96 B98 3
 
0.7%
B18 2
 
0.4%
D 2
 
0.4%
E101 2
 
0.4%
C65 2
 
0.4%
B51 B53 B55 2
 
0.4%
D36 2
 
0.4%
C68 2
 
0.4%
E8 2
 
0.4%
C93 2
 
0.4%
Other values (72) 78
 
17.5%
(Missing) 347
77.8%
ValueCountFrequency (%)
B96 B98 3
 
0.7%
F33 3
 
0.7%
C23 C25 C27 3
 
0.7%
D36 2
 
0.4%
C123 2
 
0.4%
C93 2
 
0.4%
E67 2
 
0.4%
C65 2
 
0.4%
C83 2
 
0.4%
F2 2
 
0.4%
Other values (83) 89
 
20.0%
(Missing) 334
74.9%

Length

2023-03-08T14:20:28.152524image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
b96 3
 
2.7%
b98 3
 
2.7%
e8 2
 
1.8%
b5 2
 
1.8%
e24 2
 
1.8%
b35 2
 
1.8%
c27 2
 
1.8%
c25 2
 
1.8%
c23 2
 
1.8%
d20 2
 
1.8%
Other values (80) 91
80.5%
ValueCountFrequency (%)
b96 3
 
2.2%
f33 3
 
2.2%
c23 3
 
2.2%
c25 3
 
2.2%
c27 3
 
2.2%
b98 3
 
2.2%
b51 2
 
1.5%
f 2
 
1.5%
f4 2
 
1.5%
e8 2
 
1.5%
Other values (96) 108
80.6%

Most occurring characters

ValueCountFrequency (%)
B 32
 
9.3%
2 30
 
8.7%
C 30
 
8.7%
3 28
 
8.2%
1 27
 
7.9%
5 26
 
7.6%
6 24
 
7.0%
8 19
 
5.5%
D 19
 
5.5%
9 19
 
5.5%
Other values (8) 89
25.9%
ValueCountFrequency (%)
C 43
10.2%
3 39
 
9.3%
2 39
 
9.3%
B 36
 
8.6%
1 33
 
7.9%
6 32
 
7.6%
5 25
 
6.0%
8 23
 
5.5%
22
 
5.2%
4 22
 
5.2%
Other values (8) 106
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 216
63.0%
Uppercase Letter 113
32.9%
Space Separator 14
 
4.1%
ValueCountFrequency (%)
Decimal Number 264
62.9%
Uppercase Letter 134
31.9%
Space Separator 22
 
5.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 32
28.3%
C 30
26.5%
D 19
16.8%
E 18
15.9%
A 9
 
8.0%
F 4
 
3.5%
G 1
 
0.9%
ValueCountFrequency (%)
C 43
32.1%
B 36
26.9%
E 19
14.2%
D 18
13.4%
F 9
 
6.7%
A 6
 
4.5%
G 3
 
2.2%
Decimal Number
ValueCountFrequency (%)
2 30
13.9%
3 28
13.0%
1 27
12.5%
5 26
12.0%
6 24
11.1%
8 19
8.8%
9 19
8.8%
4 18
8.3%
0 13
6.0%
7 12
 
5.6%
ValueCountFrequency (%)
3 39
14.8%
2 39
14.8%
1 33
12.5%
6 32
12.1%
5 25
9.5%
8 23
8.7%
4 22
8.3%
9 19
7.2%
7 16
6.1%
0 16
6.1%
Space Separator
ValueCountFrequency (%)
14
100.0%
ValueCountFrequency (%)
22
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 230
67.1%
Latin 113
32.9%
ValueCountFrequency (%)
Common 286
68.1%
Latin 134
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 32
28.3%
C 30
26.5%
D 19
16.8%
E 18
15.9%
A 9
 
8.0%
F 4
 
3.5%
G 1
 
0.9%
ValueCountFrequency (%)
C 43
32.1%
B 36
26.9%
E 19
14.2%
D 18
13.4%
F 9
 
6.7%
A 6
 
4.5%
G 3
 
2.2%
Common
ValueCountFrequency (%)
2 30
13.0%
3 28
12.2%
1 27
11.7%
5 26
11.3%
6 24
10.4%
8 19
8.3%
9 19
8.3%
4 18
7.8%
14
6.1%
0 13
5.7%
ValueCountFrequency (%)
3 39
13.6%
2 39
13.6%
1 33
11.5%
6 32
11.2%
5 25
8.7%
8 23
8.0%
22
7.7%
4 22
7.7%
9 19
6.6%
7 16
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 343
100.0%
ValueCountFrequency (%)
ASCII 420
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B 32
 
9.3%
2 30
 
8.7%
C 30
 
8.7%
3 28
 
8.2%
1 27
 
7.9%
5 26
 
7.6%
6 24
 
7.0%
8 19
 
5.5%
D 19
 
5.5%
9 19
 
5.5%
Other values (8) 89
25.9%
ValueCountFrequency (%)
C 43
10.2%
3 39
 
9.3%
2 39
 
9.3%
B 36
 
8.6%
1 33
 
7.9%
6 32
 
7.6%
5 25
 
6.0%
8 23
 
5.5%
22
 
5.2%
4 22
 
5.2%
Other values (8) 106
25.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
S
317 
C
86 
Q
43 
S
312 
C
98 
Q
36 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowCS
3rd rowSS
4th rowCS
5th rowSC

Common Values

ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 43
 
9.6%
ValueCountFrequency (%)
S 312
70.0%
C 98
 
22.0%
Q 36
 
8.1%

Length

2023-03-08T14:20:28.301489image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-03-08T14:20:28.450584image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:28.605884image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
s 317
71.1%
c 86
 
19.3%
q 43
 
9.6%
ValueCountFrequency (%)
s 312
70.0%
c 98
 
22.0%
q 36
 
8.1%

Most occurring characters

ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 43
 
9.6%
ValueCountFrequency (%)
S 312
70.0%
C 98
 
22.0%
Q 36
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 446
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 43
 
9.6%
ValueCountFrequency (%)
S 312
70.0%
C 98
 
22.0%
Q 36
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 446
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 43
 
9.6%
ValueCountFrequency (%)
S 312
70.0%
C 98
 
22.0%
Q 36
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 317
71.1%
C 86
 
19.3%
Q 43
 
9.6%
ValueCountFrequency (%)
S 312
70.0%
C 98
 
22.0%
Q 36
 
8.1%

Interactions

Dataset A

2023-03-08T14:20:17.350090image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.283553image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.006523image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:19.563434image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.698710image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.261026image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:15.048978image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.967941image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:16.594326image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.650866image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:17.503470image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.406329image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.143975image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:19.682219image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.866414image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.391747image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:15.332357image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.110875image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:16.736517image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.772218image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:17.714389image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.544062image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.296163image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:19.834764image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:14.212545image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.538477image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:15.490166image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.249762image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:16.892327image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.910522image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:17.872907image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.814574image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.447238image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:19.991048image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:14.516877image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.671783image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:16.261797image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.391052image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:17.052211image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.045484image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:18.013802image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.944204image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:13.577568image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.136057image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:14.799454image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:20.819807image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:16.445554image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:21.519880image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-03-08T14:20:17.203217image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:22.166564image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

Dataset A

2023-03-08T14:20:28.769146image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-03-08T14:20:29.037061image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexCabinEmbarked
PassengerId1.000-0.027-0.0650.0310.0230.0670.0000.0000.0990.000
Age-0.0271.000-0.213-0.2730.1620.1600.3110.1490.2800.182
SibSp-0.065-0.2131.0000.4650.4070.1830.1210.2270.4230.093
Parch0.031-0.2730.4651.0000.4020.1290.0000.2540.3830.078
Fare0.0230.1620.4070.4021.0000.2660.4590.1740.2730.192
Survived0.0670.1600.1830.1290.2661.0000.3080.5660.2420.109
Pclass0.0000.3110.1210.0000.4590.3081.0000.1050.4210.262
Sex0.0000.1490.2270.2540.1740.5660.1051.0000.1600.098
Cabin0.0990.2800.4230.3830.2730.2420.4210.1601.0000.407
Embarked0.0000.1820.0930.0780.1920.1090.2620.0980.4071.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexCabinEmbarked
PassengerId1.0000.092-0.029-0.020-0.0270.0140.0000.0380.0220.000
Age0.0921.000-0.238-0.2730.1300.0750.2750.1120.2880.081
SibSp-0.029-0.2381.0000.4880.4660.2080.1520.2150.4110.114
Parch-0.020-0.2730.4881.0000.3880.0830.0000.2570.3550.032
Fare-0.0270.1300.4660.3881.0000.2730.4720.1860.3530.229
Survived0.0140.0750.2080.0830.2731.0000.3300.5320.0960.176
Pclass0.0000.2750.1520.0000.4720.3301.0000.1310.4180.263
Sex0.0380.1120.2150.2570.1860.5320.1311.0000.0000.094
Cabin0.0220.2880.4110.3550.3530.0960.4180.0001.0000.405
Embarked0.0000.0810.1140.0320.2290.1760.2630.0940.4051.000

Missing values

Dataset A

2023-03-08T14:20:18.255110image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-03-08T14:20:23.124359image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-03-08T14:20:18.606890image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-03-08T14:20:23.382293image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-03-08T14:20:18.802860image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-03-08T14:20:23.565777image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
16416503Panula, Master. Eino Viljamimale1.041310129539.6875NaNS
45545613Jalsevac, Mr. Ivanmale29.0003492407.8958NaNC
82282301Reuchlin, Jonkheer. John Georgemale38.000199720.0000NaNS
78979001Guggenheim, Mr. Benjaminmale46.000PC 1759379.2000B82 B84C
22022113Sunderland, Mr. Victor Francismale16.000SOTON/OQ 3920898.0500NaNS
41241311Minahan, Miss. Daisy Efemale33.0101992890.0000C78Q
77978011Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)female43.00124160211.3375B3S
49749803Shellard, Mr. Frederick WilliammaleNaN00C.A. 621215.1000NaNS
71171201Klaber, Mr. HermanmaleNaN0011302826.5500C124S
25425503Rosblom, Mrs. Viktor (Helena Wilhelmina)female41.00237012920.2125NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
656613Moubarek, Master. GeriosmaleNaN11266115.2458NaNC
13713801Futrelle, Mr. Jacques Heathmale37.01011380353.1000C123S
43443501Silvey, Mr. William Bairdmale50.0101350755.9000E44S
51952003Pavlovic, Mr. Stefomale32.0003492427.8958NaNS
69869901Thayer, Mr. John Borlandmale49.01117421110.8833C68C
49049103Hagland, Mr. Konrad Mathias ReiersenmaleNaN106530419.9667NaNS
71271311Taylor, Mr. Elmer Zebleymale48.0101999652.0000C126S
56356403Simmons, Mr. JohnmaleNaN00SOTON/OQ 3920828.0500NaNS
29129211Bishop, Mrs. Dickinson H (Helen Walton)female19.0101196791.0792B49C
68969011Madill, Miss. Georgette Alexandrafemale15.00124160211.3375B5S

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
39339411Newell, Miss. Marjoriefemale23.01035273113.2750D36C
72472511Chambers, Mr. Norman Campbellmale27.01011380653.1000E8S
62963003O'Connell, Mr. Patrick DmaleNaN003349127.7333NaNQ
12112203Moore, Mr. Leonard CharlesmaleNaN00A4. 545108.0500NaNS
37437503Palsson, Miss. Stina Violafemale3.03134990921.0750NaNS
72572603Oreskovic, Mr. Lukamale20.0003150948.6625NaNS
798013Dowdell, Miss. Elizabethfemale30.00036451612.4750NaNS
24124213Murphy, Miss. Katherine "Kate"femaleNaN1036723015.5000NaNQ
64764811Simonius-Blumer, Col. Oberst Alfonsmale56.0001321335.5000A26C
76676701Brewe, Dr. Arthur JacksonmaleNaN0011237939.6000NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
76376411Carter, Mrs. William Ernest (Lucile Polk)female36.012113760120.0000B96 B98S
50951013Lang, Mr. Fangmale26.000160156.4958NaNS
64664703Cor, Mr. Liudevitmale19.0003492317.8958NaNS
47547601Clifford, Mr. George QuincymaleNaN0011046552.0000A14S
14614713Andersson, Mr. August Edvard ("Wennerstrom")male27.0003500437.7958NaNS
66766803Rommetvedt, Mr. Knud PaustmaleNaN003129937.7750NaNS
76576611Hogeboom, Mrs. John C (Anna Andrews)female51.0101350277.9583D11S
66166203Badt, Mr. Mohamedmale40.00026237.2250NaNC
44744811Seward, Mr. Frederic Kimbermale34.00011379426.5500NaNS
72572603Oreskovic, Mr. Lukamale20.0003150948.6625NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.